Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis
نویسندگان
چکیده
One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviors. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviors, because a maximization of the PI corresponds to an exploration of morphology- and environment-dependent behavioral regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost.
منابع مشابه
Information-driven intrinsic motivation in reinforcement learning
One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and fut...
متن کاملIdentifying key steps in developing a one-stop shop for health policy and system information in a limited-resource setting: A case study
Background: There is limited understanding about the development of the online one-stop shops for evidence in a limited-resource setting, such as Uganda. This study aimed to provide a comprehensive account of the development process of the online resource for local policy and systems-relevant information in this setting. Methods: We utilized a case study design to address our objective where ...
متن کاملPower and Agenda-Setting in Tanzanian Health Policy: An Analysis of Stakeholder Perspectives
Background Global health policy is created largely through a collaborative process between development agencies and aid-recipient governments, yet it remains unclear whether governments retain ownership over the creation of policy in their own countries. An assessment of the power structure in this relationship and its influence over agenda-setting is thus the first step towards understanding w...
متن کاملImplementing Bounded Linear Programming and Analytical Network Process Fuzzy Models to Motivate Employees: a Case Study
In this research, the factors affectinguniversity employees’ motivation and productivity are identified and classified in seven groups; the impact of each motivation factor on the productivity is presented by ANP fuzzy model.Eight universities in Iran were analyzed in this research work. The aim of this study is to explore the productivity of employees. This paper attempts to give new insights ...
متن کاملFinite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting
In reinforcement learning (RL) , one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous Gradient-based Temporal Difference(GTD) ...
متن کامل